Apparatus and method for time-alignment of two signals

ABSTRACT

A recently received portion of a transmit signal is utilized, and a pitch period of the received signal is determined. Then, the determined pitch period is utilized to search for similarities in the originally transmitted signal. This searching is done over a range of delays. Once the optimal delay is determined, it is then utilized as the delay between the transmitted signal and the received signal. Once this delay is known, the received signal can be time-aligned with the delayed signal.

TECHNICAL FIELD

[0001] This invention relates to the transmission of digitally encoded voice, and in particular, to the alignment of digital voice signals.

BACKGROUND OF THE INVENTION

[0002] In the transmission of digitally encoded voice, it is often important to be able to time-align the originally transmitted encoded voice information (also referred to as a transmitted signal) with the received encoded voice information (also referred to as a received signal) after received voice information has been transported through a switching network. The switching network can be a traditional circuit switch network or a packet switching network such as ATM, frame relay, or Internet switching. One use for the time alignment of the original digitally encoded voice with the received digital encoded voice is in order to perform speech quality assessment. Until recently the only way to measure users' perception of the quality of voice transmission systems was to conduct subjective tests utilizing humans to make testing judgments. However, subjective tests are expensive and slow, can not be used in certain applications such as in-service monitoring. Various objective models, based on human perception, were therefore developed with the aim of predicting the results of human subjective tests. Various algorithms have been proposed to assess the perception of the quality of transmitted digital voice. The most promising of these algorithms is the perceptual evaluation of speech quality (PESQ). This algorithm has become the basis for the International Telecommunication Union (ITU-T) standard P862. This new standard requires the time alignment of a received digitally encoded voice with a transmitted digitally encoded voice. The method for performing the time alignment of the two voice signals proposed in this standard uses a complicated splitting of speech utterances within the overall speech signal to perform re-alignment of incorrectly aligned samples. Such a technique would result in a complex and expensive alignment algorithm.

SUMMARY OF THE INVENTION

[0003] This invention is directed to an apparatus and method that solves the problems and disadvantages of the prior art. A recently received portion of a transmit signal is utilized and a pitch period of a received signal is determined. Then, the determined pitch period is utilized to search for similarities in periods of the originally transmitted signal. This searching is done over a range of delays. Once the optimal delay is determined, it is then utilized as the delay between the transmitted signal and the received signal. Once this delay is known, the received signal can be time-aligned with the delayed signal.

BRIEF DESCRIPTION OF THE DRAWING

[0004]FIG. 1 illustrates an embodiment of the invention;

[0005]FIG. 2 illustrates an embodiment of a time-alignment unit;

[0006]FIG. 3 illustrates, in flowchart form, an embodiment of the invention; and

[0007]FIG. 4 illustrates another embodiment of the invention.

DETAILED DESCRIPTION

[0008]FIG. 1 illustrates a system whereby a voice signal (digital encoded information) is used as input to voice transmit unit 101 and transported via network 102 and voice receive unit 103 to output 107. Time aligned unit 104 is responsive to input signal 106 and the signal of output 107 to produce output 108 which is input signal 106 time shifted with respect to the transmission delay to match output signal 107. This time shifted signal of input signal 106 is transmitted from output 108. In addition, time aligned unit 104 produces output 109 that designates the amount of delay between input signal 106 and output signal 107.

[0009]FIG. 2 illustrates time-alignment unit 104 in greater detail. Block 201 calculates the time delay between the transmitted signal 106 and the received signal 107. Block 201 is also referred to as a time delay calculator. Greater details on how this time delay is calculated are given with respect to FIG. 3. Block 201 outputs the calculated time delay via output 109. Buffer 202 which buffers transmitted output 106 utilizes the calculated time delay to determine what portion of buffer 202 matches the present received signal 107. The output of buffer 202 is transmitted on output 108. As previously noted, quality assessment equipment utilizes the signal of output 108 and the input signal to input 106 to determine the perceptual quality of the transmission of units 101 and 103 and network 102.

[0010] As is well known to one skilled in the art, output 107 will output a signal that is a distorted version of the input/transmitted signal 106. Not only is output 107 time delayed from signal 106 but distortion is also introduced by elements 101-103. Primarily, it is this distortion that is to be analyzed by a perceptual speech assessment unit utilizing outputs 107 and 108.

[0011]FIG. 3 illustrates, in flowchart form, steps performed in implementing an embodiment of the invention. FIG. 3 illustrates the operation of an embodiment of blocks 201 and 202 of FIG. 2. Once started from block 301, block 302 calculates the pitch period for a buffer of samples of the received signal outputted from block 103 of FIG. 1. Next, block 303 utilizes this calculated pitch period to compute the similarities between the transmitted signal and the received signal over a range of delays. For each particular delay, the similarity is determined. Once the similarities have been determined for each delay within the range of delays, block 304 chooses the delay that produces the greatest similarity for the calculated period within the received and transmitted signals.

[0012] Block 305 then utilizes this chosen delay to control buffer 202 of FIG. 2 so that the output 108 is properly aligned with the received signal. Block 306 indicates that the operations are completed.

[0013] Consider now in greater detail the operations performed by block 302 of FIG. 3 in calculating the pitch period for the received signal. The assumption is made that for human speech sampled at 8 kHz that the possible range of pitch period values, T, range from a minimum, T_(min), of 19 samples, to a maximum, T_(max), of 140 samples. Within this range of possible pitch periods, T, a score S(T) is calculated for each possible pitch period by the following calculation: $\begin{matrix} {{S(T)} = {\frac{1}{T}{\sum\limits_{n = 0}^{T}\quad {{{x\lbrack n\rbrack} - {x\left\lbrack {n - T} \right\rbrack}}}}}} & {{Equation}\quad 1} \end{matrix}$

[0014] This equation sums the absolute difference between samples from a current period and the previous period for a range of T samples and then divides the result by T. The optimal calculated period, T_(opt), is found by taking the computed values for S(T) and finding the value of T which gives the minimum S(T). This value of T is designated T_(opt).

[0015] Consider now in greater detail the operations performed by block 303 in one embodiment of the invention. The operations of block 303 are illustrated in the following two equations: $\begin{matrix} {{A\lbrack d\rbrack} = \frac{\sum\limits_{n = 0}^{T_{opt}}\quad \left\lbrack {{x\left\lbrack {n - d} \right\rbrack}{y\lbrack n\rbrack}} \right\rbrack}{\sum\limits_{n = 0}^{T_{opt}}\quad \left\lbrack {x\left\lbrack {n - d} \right\rbrack} \right\rbrack^{2}}} & {{Equation}\quad 2} \\ {{P\lbrack d\rbrack} = \frac{\sum\limits_{n = 0}^{T_{opt}}\quad \left\lbrack {{y\lbrack n\rbrack} - {{A\lbrack d\rbrack}{x\left\lbrack {n - d} \right\rbrack}}} \right\rbrack}{\sum\limits_{n = 0}^{T_{opt}}\quad \left\lbrack {y\lbrack n\rbrack} \right\rbrack^{2}}} & {{Equation}\quad 3} \end{matrix}$

[0016] These equations are calculated for a range of delays from D_(min) to D_(max). The two equations calculate a value that is essentially the periodicity between the transmitted and received signals. The first of these two equations, equation 2, calculates a value, A[d], for each of the delays within the range. Each of these values of A is then utilized to calculate values, P[d], for each of the delays over the range of delays by repeatly calculating equation 3. To determine the optimal delay, D_(opt), the following equation is calculated:

D _(opt)=min {P[d]}

Equation 4

[0017] D_(opt) is equal to the delay whose value, P, is the smallest. Equation 4 implements block 304 of FIG. 3. Block 202 of FIG. 2 is then responsive to the value, D_(opt), to align the transmitted signal with the received signal.

[0018]FIG. 4 is another embodiment of the invention. Elements 401-403 and elements 407-409 are identical in function to elements 101-103 and elements 107-109. Switching network 411 has a delay with no variation or a small variation in delay and known distortion characteristics. One such switching network is a circuit switched network as it well known to those skilled in the art. By switching the output 407 back to time align unit 404 via switching network 411, it is possible to determine the time-shift between input 406 and output 407 when elements 401 are separated by a great distance. Time aligned unit 404 functions the same manner as time aligned unit 104 except that output 412 indicates the delay time through elements 401-403 and output 408 indicates the delay time through elements 401-403 and element 411.

[0019] Of course, various changes and modifications to the illustrated embodiments described above will be apparent to those skilled in the art. These changes and modifications can be made without departing from the spirit and scope of the invention and without diminishing its intending advantages. It is therefore intended that such changes and modifications be covered by the following claims except insofar as limited by the prior art. 

What is claimed is:
 1. A method for determining time alignment between two signals, comprising the steps of: calculating a pitch period in a first signal; comparing the calculated pitch period with the second signal to determine an occurrence of the calculated pitch period in the second signal; determining a delay between first and second signals; and aligning the first and second signals using the determined delay.
 2. The method of claim 1 wherein the second signal is transmitted and the first signal is the transmitted second signal.
 3. The method of claim 2 further comprises transmitting the second signal through a switching network; and receiving the transmitted second signal and designating the transmitted second signal as the first signal.
 4. The method of claim 3 further comprises transmitting the first signal through a second switching network before the step of calculating is performed.
 5. The method of claim 4 wherein delay and distortion of the second switching network are known.
 6. The method of claim 4 wherein the second switching network is a circuit switching network.
 7. The method of claim 2 wherein the step of comparing comprises the steps of computing similarities between the pitch period of the first signal and the second signal; and using the highest similarity to determine the delay.
 8. The method of claim 7 wherein the step of computing is performed over a range of delays.
 9. The method of claim 1 wherein the step of aligning comprises the steps of buffering the second signal; and controlling the step of buffering with the delay to align the first and second signals.
 10. The method of claim 9 wherein the second signal is transmitted and the first signal is the transmitted second signal.
 11. The method of claim 10 wherein the step of comparing comprises the steps of computing similarities between the pitch period of the first signal and the second signal; and using the highest similarity to determine the delay.
 12. The method of claim 11 wherein the step of computing is performed over a range of delays.
 13. The method of claim 1 wherein the step of comparing comprises the steps of computing similarities between the pitch period of the first signal and the second signal; and using the highest similarity to determine the delay.
 14. The method of claim 13 wherein the step of computing is performed over a range of delays.
 15. The method of claim 13 wherein the step of aligning comprises the steps of buffering the second signal; and controlling the step of buffering with the delay to align the first and second signals.
 16. An apparatus for time aligning a signal transmitted through a switching network with the signal received from the switching network, comprising: a transmitter for transmitting the signal; the switching network communicating the transmitted signal; a receiver for receiving the transmitted signal from the switching network; a time delay calculator responsive to the received signal for determining a pitch period in the received signal and comparing the calculated pitch period with the signal to determine a delay between the signal and the received signal; a buffer for storing the signal; and the buffer responsive to the determined delay to time align the signal with the received signal.
 17. The apparatus of claim 16 wherein time delay calculator further performs the comparison by computing similarities between the pitch period of the received signal and the signal and using the highest similarity to determine the delay.
 18. The method of claim 17 wherein the computing is performed over a range of delays.
 19. The apparatus of claim 16 further comprises a second switching network and the receiver communicating the received signal to the time delay calculator via the second switching network.
 20. The apparatus of claim 19 wherein delay and distortion of the second switching network are known.
 21. The apparatus of claim 19 wherein the second switching network is a circuit switching network.
 22. The apparatus of claim 20 wherein the buffer uses the delay of the second switching network to time align the signal and received signal.
 23. The apparatus of claim 16 wherein the time delay calculator outputs the delay.
 24. An apparatus for determining time delay between a signal and a received signal, comprising: means for transmitting the signal; means for communicating the transmitted signal to a means for receiving the signal; means for calculating a pitch period in the received signal; and means for searching the signal with the pitch period to determine the time delay.
 25. The apparatus of claim 24 further comprises means for buffering the signal; and means responsive to the time delay for aligning the buffered signal with the received signal.
 26. The apparatus of claim 25 wherein the means for receiving further transmitting the received signal to the means for calculating via a switching network.
 27. The apparatus of claim 26 wherein the switching network is a circuit switching network. 